Why Table Ground-Truthing is Hard

نویسندگان

  • Jianying Hu
  • Ramanujan S. Kashi
  • Daniel P. Lopresti
  • Gordon T. Wilfong
  • George Nagy
چکیده

The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem may, in fact, be much more difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable “truth” and/or incomplete or partial “truths.”

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Table and Its Ground Truth Automatic Generation: A Tool for Table Understanding Research

We developed a software tool to assist table understanding research. It can analyze any given table ground truth and generate documents that include similar table elements while have more variety on both table and non-table parts. Based on our novel content matching ground truthing idea, the table ground truth data for the generated table elements become available with little manual work. The v...

متن کامل

Interactive degraded document enhancement and ground truth generation

Degraded documents are frequently obtained in various situations. Examples of degraded document collections include historical document depositories, document obtained in legal and security investigations, and legal and medical archives. Degraded document images are hard to to read and are hard to analyze using computerized techniques. There is hence a need for systems that are capable of enhan...

متن کامل

Modified ground-truthing: an accurate and cost-effective food environment validation method for town and rural areas

BACKGROUND A major concern in food environment research is the lack of accuracy in commercial business listings of food stores, which are convenient and commonly used. Accuracy concerns may be particularly pronounced in rural areas. Ground-truthing or on-site verification has been deemed the necessary standard to validate business listings, but researchers perceive this process to be costly and...

متن کامل

Sustainable Ground Water Development in Hard Rock Aquifers in Low-Income Countries and the Role of UNESCO _ IUGS - IGCP projec -GROWNET-

Hard rock aquifers for the purpose of this Paper mean the non-carbonate, fractured rock aquifers in the terrain covered by crystalline basement complex, metamorphic rocks and also by extensive effusive volcanic rocks like the basalts of western India (Deccan traps. Ground water development in hard rock aquifer areas has always played a secondary role compared to that in the areas having high-yi...

متن کامل

Table Metadata: Headers, Augmentations and Aggregates

A sample of 200 web tables was interactively converted into layout-independent Augmented Wang Notation (AWN) using the Table Abstraction Tool (TAT). The resulting XML ground-truth files list for each table (1) cell contents, (2) relationships between the hierarchical column and row headers and the value/content/data cells, (3) designators for aggregates like totals and averages, and (4) ancilla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001